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Abstract. Rule-based policy and contract systems have rarely been stud- 
ied in terms of their software engineering properties. This is a serious omis- 
sion, because in rule-based policy or contract representation languages rules 
are being used as a declarative programming language to formalize real- 
world decision logic and create IS production systems upon. This paper 
adopts an SE methodology from extreme programming, namely test driven 
development, and discusses how it can be adapted to verification, valida- 
tion and integrity testing (V&V&I) of policy and contract specifications. 
Since, the test-driven approach focuses on the behavioral aspects and the 
drawn conclusions instead of the structure of the rule base and the causes 
of faults, it is independent of the complexity of the rule language and the 
system under test and thus much easier to use and understand for the rule 
engineer and the user. 
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1 Test-driven V&V for Rule-based Policies and Contracts 

Increasing interest in industry and academia in higher-level policy and contract lan- 
guages has led to much recent development. Different representation approaches have 
been propose, reaching from general syntactic XML markup languages such as WS- 
Policy, WS- Agreement or WSLA to semantically-rich (ontology based) policy represen- 
tation languages such as Rei, KAoS or Ponder and highly expressive rule based contract 
languages such as the RBSLA language T or the SweetRules approach. In this paper 
we adopt the rule-based view on expressive high-level policy and contract languages 
for representing e.g. SLAs, business policies and other contractual, business-oriented 
decision logic. In particular, we focus on logic programming techniques. Logic program- 
ming has been one of the most successful representatives of declarative programming. 
It is based on solid and well-understood theoretical concepts and has been proven to be 
very useful for rapid prototyping and describing problems on a high abstraction level. 
In particular, the domain of contractual agreements, high-level policies and business 
rules' decision logic appears to be highly suitable to logic programming. For instance, 
IT service providers need to manage and possibly interchange large amounts of SLAs 
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/ policies / business rules which describe behavioral, contractual or business logic us- 
ing different rule types to describe e.g. complex conditional decision logic (derivation 
rules), reactive or even proactive behavior (EGA rules), normative statements and legal 
rules (deontic rules), integrity definitions (integrity constraints) or defaults, rule pref- 
erences and exceptions (non-monotonic defeasible rules). Such rule types have been 
shown to be adequately represented and formalized as logic programs (LPs) - see the 
ContractLog KR developed in the RBSLA project [^. However, the rule-based pol- 
icy and contract representation imposes some specific needs on the engineering and 
life-cycle management of the formalized specifications: The policy rules must be nec- 
essarily modelled evolutionary, in a close collaboration between domain experts, rule 
engineers and practitioners and the statements are not of static nature and need to 
be continuously adapted to changing needs. The future growth of policies or contract 
specifications, where rules are often managed in a distributed way and are interchanged 
between domain boundaries, will be seriously obstructed if developers and providers 
do not firmly face the problem of quality, predictability, reliability and usability w.r.t. 
understandability of the results produced by their rule-based policy /contract systems 
and programs. Furthermore, the derived conclusions and results need to be highly re- 
liable and traceable to count even in the legal sense. This amounts for verification, 
validation and integrity testing (V&V&I) techniques, which are much simpler than the 
rule based specifications itself, but nevertheless adequate (expressive enough) to ap- 
proximate their intended semantics, determine the reliability of the produced results, 
ensure the correct execution in a target inference environment and safeguard the life 
cycle of possibly distributed and unitized rules in rule-based policy projects which are 
likely to change frequently. 

Different approaches and methodologies to V&V of rule-based systems have been pro- 
posed in the literature such as model checking, code inspection or structural debugging. 
Simple operational debugging approaches which instrument the policy/contract rules 
and explore its execution trace place a huge cognitive load on the user, who needs 
to analyze each step of the conclusion process and needs to understand the struc- 
ture of the rule system under test. On the other hand, typical heavy-weight V&V 
methodologies in Software Engineering (SE) such as waterfall-based approaches are 
often not suitable for rule-based systems, because they induce high costs of change 
and do not facilitate evolutionary modelling of rule-based policies with collaborations 
of different roles such as domain experts, system developers and knowledge engineers. 
Moreover, they can not check the dynamic behaviors and the interaction between dy- 
namically updated and interchanged policies/contracts and target execution environ- 
ments at runtime. Model-checking techniques and methods based e.g. on algebraic-, 
graph- or Petri-net-based interpretations are computationally very costly, inapplicable 
for expressive policy/contract rule languages and presuppose a deep understanding of 
both domains, i.e. of the the testing language / models and of of the rule language and 
the rule inferences. Although test-driven Extreme Programming (XP) techniques and 
similar approaches to agile SE have been very successful in recent years and are widely 
used among mainstream software developers, its values, principles and practices have 
not been transferred into the rule-based policy and contract representation community 
yet. V&V has been an important area of research in the expert-system and knowledge 
engineering community in the mid '80s to the early '90s manly applying debugging 
techniques or transformation approaches into analytical models such as graphs or alge- 
braic structures. However, to the best of our knowledge nearly no work has been done 
in building on these results for V&V of rule-based policy /contract specifications and 
on adopting recent trends in SE to the domain of policy engineering. In this paper, we 
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adapt a successful methodology of XP, namely test cases (TCs), to verify and validate 
correctness, reliability and adequacy of rule-based policy and contract specifications. 
It is well understood in the SE community that test-driven development improves the 
quality and predictability of software releases and we argue that TCs and integrity 
constraints (ICs) also have a huge potential to be a successful tool for declarative V&V 
of rule-based policy and contract systems. TCs in combination with other SE method- 
ologies such as test coverage measurement which is used to quantify the completeness 
of TCs as a part of the feedback loop in the development process and rule base re- 
finements (a.k.a. refactorings) |19l which optimize the existing rule code, e.g. remove 
inconsistencies, redundancy or missing knowledge without breaking its functionality, 
qualify for typically frequently changing requirements and models of rule-based policies 
and contracts (e.g. SLAs). Due to their inherent simplicity TCs, which provide an ab- 
stracted black-box view on the rules, better support different roles which are involved 
during the engineering process. In our approach TCs are written homogeneously in the 
target programming language, i.e. in the contract/policy rule language, so that they 
can be managed, maintained and distributed together with the policies/contracts. Us- 
ing TCs and ICs to represent the constraints which describe the intended semantics 
of a policy /contract specification gives policy engineers an expressive but nevertheless 
easy to use testing language and makes policies self validating, to an large extend. 
In a feedback loop changing requirements and detected faults (bugs) are translated 
into new TCs, and the policy specification is then modified until the old and the new 
TCs succeed. This also helps to avoid atrophy of the rule code and the TCs when the 
policies are dynamically changed and extended. During rule interchange in open dis- 
tributed environment TCs can be used to ensure correct execution of an interchanged 
LP in a target execution environment by validating the interchanged rules with the 
attached TCs. The further paper is structured as follows: In section 2 we review basics 
in V&V research. In section 3 we define syntax and semantics of TCs and ICs for LP 
based policy /contract specifications. In section 4 we introduce a declarative test cover- 
age measure which draws on inductive logic programming techniques. In section 5 we 
discuss TCs for V&V of rule engines and rule interchange. In section 6 we describe our 
reference implementation in the ContractLog KR and integrate our approach into an 
existing SE test framework (JUnit) and a rule markup language (RuleML). In section 
7 we discuss related work and conclude this paper with a discussion of the test-drive 
V&V&I approach for rule-based policies and contracts. 



2 Basics in Rule-based V&V Research 

V&V of rule-based policy/contract specifications is vital to assure that the LP used 
to formalize the policy/contract rules performs the tasks which it was designed for. 
Accordingly, the term V&V is used as a rough synonym for "evaluation and testing". 
Both processes guarantee that the LP provides the intended answer, but also imply 
other goals such as to assure the security or maintenance and service of the rule-based 
system. There are many definitions of V&V in the SE literature. In the context of V&V 
of rule-based policies/contracts we use the following: 

1. Verification ensures the technical correctness of a LP. Akin to traditional software engineering 
a distinction between structurally flawed or logically flawed rule bases can be made with structural 
checks for redundancy or relevance and semantic checks for consistency, soundness and completeness. 

2. As discussed by Gonzales ^ validation should not be confused with verification. Validation is 
concerned with the logical correctness of a rule-based system in a particular environment/situation 
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and domain. Typically, validation is based on tests, desirably in the real environment and under real 
circumstances, where the rule base is considered as a "black box" which produces certain outputs 
(answer to queries) given a set of input data (assertions represented as facts). 

During runtime certain parts of the rule based decision logic should be static and not 
subjected to changes or it must be assured that updates do not change this part of the 
intended behavior of the policy/contract. A common way to represent such constraints 
are ICs. Roughly, if validation is interpreted as: Are we building the right product?^^ 
and verification as: "Are we building the product right?" then integrity might be loosely 
defined as: "Are we keeping the product right!" , leading to the new pattern: V&V&I. 
Hence, ICs are a way to formulate consistency (or inconsistency) criteria of a dynami- 
cally updated knowledge base (KB). Another distinction which can be made is between 
errors and anomalies: 

- Errors represent problems which directly effect the operations of a rule base. The simplest source of 
errors are typographical mistakes which can be solved by a verifying parser. More complex problems 
arise in case of large rule bases incorporating several people during design and maintenance and in 
case of the dynamic alteration of the rule base via adding, changing or refining the knowledge which 
might easily lead to incompleteness and contradictions. 

- Anomalies are considered as symptoms of genuine errors, i.e. they man not necessarily represent 
problems in themselves. 

Much work has been done to establish and classify the nature of errors and anomalies 
that may be present in rule bases, see e.g. the taxonomy of anomalies from Preece 
and Shinghal A general distinction can be made between errors/anomalies con- 
cerned with the design of rule bases and those concerned with the inferences. Typical 
inference errors are e.g. redundant rules, circular rules or dead end rules (in forward 
chaining systems). Typical design errors/ anomalies are e.g. duplication, inconsistency 
or subsumedness. For a detailed discussion of potential errors and anomalies that may 
occur in rule bases see e.g. 0. Here, we briefly review the notions that are commonly 
used in the literature 8 7 9^ , which range from semantic checks for consistency and 
completeness to structural checks for redundancy, relevance and reachability: 

1. Consistency: No conflicting conclusions can be made from a set of valid input data. The common 
definition of consistency is that two rules or inferences are inconsistent if they succeed at the same 
knowledge state, but have conflicting results. Several special cases of inconsistent rules are consid- 
ered in literature such as: 

- self- contradicting rules and self- contradicting rule chains, e.g. p f\ q —tp 

- contradicting rules and contradicting rule chains, e.g. p /\ q s and p f\ q ^ —<s 

Note that the first two cases of self-contradiction are not consistent in a semantic sense and can 
equally be seen as redundant rules, since they can be never concluded. 

2. Correctness/ Soundness: No invalid conclusions can be inferred from valid input data, i.e. a rule 
base is correct when it holds for any complete model AI , that the inferred output from valid inputs 
via the rule base are true in AI. This is closely related to soundness which checks that the intended 
outputs indeed follows from the valid input. Note, that in case of partial models with only partial in- 
formation this means that all possible partial models need to be verified instead of only the complete 
models. However, for monotonic inferences these notions coincide and a rule base which is sound is 
also consistent. 

3. Completeness: No valid input information fails to produce the intended output conclusions, i.e. 
completeness relates to gaps (incomplete knowledge) in the knowledge base. The iterative process 
of building large rule bases where rules are tested, added, changed and refined obviously can leave 
gaps such as missing rules in the knowledge base. This usually results in intended derivations which 
are not possible. Typical sources of incompleteness are missing facts or rules which prevent intended 
conclusions to be drawn. But there are also other sources. A KB having too many rules and too many 
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input facts negatively influences performance and may lead to incompleteness due to termination 
problems or memory overflows. Hence, superfluous rules and non-terminating rule chains can be also 
considered as completeness problems, e.g.: 

- Unused rules and facts, which are never used in any rule/query derivation (backward reasoning) 
or which are unreachable or dead-ends (forward reasoning). 

- Redundant rules such as identical rules or rule chains, e.g. p — > q and p — ^ q. 

- Subsumed rules, a special case of redundant rules, where two rules have the same rule head but 
one rule contains more prerequisites (conditions) in the body, e.g. p A q — > r and p — > r. 

~ Self-contradicting rules, such as p A g A -ip — > r or simply p — > -ip, which can never succeed. 

- Loops in rules of rule chains, e.g. p A q — > q or tautologies such as p — ^ p. 



3 Homogeneous Integration of Test Cases and Integrity 
Constraints into Logic Programs 

The relevance of V&V of rule bases and LPs has been recognized in the past (see 
section 2 and 7) and most recently also in the context of policy explanations I1U| . 
The majority of these approaches rely on debugging the derivation trees and giving 
explanations (e.g. via spy and trace commands) or transforming the program into 
other more abstract representation structures such as graphs, petri nets or algebraic 
structures which are then analyzed for inconsistencies. Typically, the definition of an 
inconsistency, error or anomaly (see section 2) is then given in the language used for 
analyzing the LP, i.e. the V&V information is not expressed in the same representation 
language as the rules. This is in strong contrast to the way people would like to engineer, 
manage and maintain rule-based policies and systems. Different skills for writing LPs 
and analyzing them are needed as well as different systems for reasoning with rules 
and for V&V. Moreover, the used V&V methodologies (e.g. model checking or graph 
theory) are typically much more complicated than the rule-based programs. In fact, it 
turns out that even writing rule-based systems that are useful in practice is already 
of significant complexity, e.g. due to non-monotonic features or different negations, 
and that simple methods are needed to safeguard the engineering and maintenance 
process w.r.t. V&V&I. Therefore, what policy engineers and practitioners would like 
to have is an "easy-to-use" approach that allows representing rules and tests in the 
same homogeneous representation language, so that they can be engineered, executed, 
maintained and interchanged together using the same underlying syntax, semantics, 
methodologies and execution/ inference environment. In this section we elaborate on 
this homogeneous integration approach based on the common " denominator" : extended 
logic programming. 

In the following we use the standard LP notation with an ISO Prolog related 
scripting syntax called Prova nTi and we assume that the reader is familiar with logic 
programming techniques |12| . For the semantics of the knowledge base we adapt a 
rather general definition |13l of LP semantics which also possibly include some form of 
non-monotonic reasoning, because our test-driven approach is intended to be general 
and applicable to several logic classes / rule languages (e.g. propositional, DataLog, 
normal, extended) in order to fulfill the different KR needs of particular policy or con- 
tract representation projects (e.g. w.r.t expressiveness and computational complexity 
which are in a trade-off relation to each other). In particular, as we will show in section 
5, TCs can be also used to verify the possible unknown semantics of a target inference 
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service in a open environment such as the (Semantic) Web and test the correct execu- 
tion of an interchanged policy /contract in the target environment. 

- A semantics SEM(P) of a LP P is proof-theoretically defined as a set of literals that are 
derivable from P using a particular derivation mechanisms, such as linear SLD (NF)-resolution 
variants with negation-as-finite-failure rule or non-linear tabling approaches such as SLG reso- 
lution. Model-theoretically, a semantics SEM{P) of a program P is a subset of all models of P: 
MOD(P) . In this paper in most cases a subset of the (3-valued) Herbrand-models of the language 
of Lp: SEM{P) C MOD^^^^Lp (P). Associated to SEM(P) are two entailment relations: 

1. sceptical, where the set of all atoms or default atoms are true in all models of SEM(P) 

2. credulous, where the set of all atoms or default atoms are true in at least one model of SEM(P) 

- A semantics SEM' extends a semantics SEM denoted by SEM' > SEM , if for all programs 
P and all atoms I the following holds: SEM(P) \= I =^ SEM' {P) \= I, i.e. all atoms derivable 
from SEM with respect to P are also derivable from SEM' , but SEM' derives more true or false 
atoms than SEM. The semantics SEM' is defined for a class of programs that strictly includes 
the class of programs with the semantics SEM . SEM' coincides with SEM for all programs of 
the class of programs for which SEM is defined. 

In our ContractLog reference implementation we mainly adopt the sceptical view- 
point on extended LPs and apply am extended linear SLDNF variant as procedural 
semantics which has been extended with explicit negation, goal memoization and loop 
prevention to overcome typical restrictions of standard SLDNF and compute WFS (see 
ContractLog inference engine). 

The general idea of TCs in SE is to predefine the intended output of a program 
or method and compare the intended results with the derived results. If both match, 
the TC is said to capture the intended behavior of the program/method. Although 
there is no 100% guarantee that the TCs defined for V&V of a program exclude every 
unintended results of the program, they are an easy way to approximate correctness and 
other SE-related quality goals (in particular when the TCs and the program are refined 
in an evolutionary, iterative process with a feedback loop). In logic programming we 
think of a LP as formalizing our knowledge about the world and how the world behaves. 
The world is defined by a set of models. The rules in the LP constrain the set of possible 
models to the set of models which satisfy the rules w.r.t the current knowledge base 
(actual knowledge state). A query Q to the LP is typically a conjunction of literals 
(positive or negative atoms) Gi A .. A Gn, where the literals d may contain variables. 
Asking a query Q to the LP then means asking for all possible substitutions 6 of 
the variables in Q such that Q9 logically follows from the LP P and P \= Q. The 
substitution set 6 is said to be the answer to the query, i.e. it is the output of the 
program P. Hence, following the idea of TCs, for V&V of a LP P we need to predefine 
the intended outputs of P as a set of (test) queries to P and compare it with the 
actual results / answers derived from P by asking these test queries to P. Obviously, 
the set of possible models of a program might be quite large (even if many constraining 
rules exist), e.g. because of a large fact base or infinite functions. As a result the set 
of test queries needed to test the program and V&V of the actual models of P would 
be in worst case also infinite. However, we claim that most of the time correctness 
of a set of rules can be Eissured by testing a much smaller subset of these models. In 
particular, as we will see in the next section, in order to be an adequate cover for a LP 
the tests need to be only a least general instantiation (specialization) of the rules' terms 
(arguments) in order to fully investigate and test all rules in P. This also supports our 
second claim, that V&V of LPs with TC can be almost ever done in reasonable time, 
due to the fact that the typical test query is a ground query (without variables) which 
has a small search space (as compared to queries with free variables) and only proves 
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existence of at least one model satisfying it. In analogy to TCs in SE we define a TC 

as TC := {A,T} for a LP P to consists of: 

1. a set of possibly empty input assertions "A" being the set of temporarily asserted test input 
facts (and additionally meta test rules - see section 5) defined over the alphabet " L" . The assertions 

arc used to temporarily setup the test environment. Thc>' can be e.g. used to define test facts, 
result values of (external) functions called by procedural attachments, events and actions for testing 
reactive rules or additional meta test rules. 

2. a set of one ore more tests T. Each test Ti, i > consists of: 

- a test query Q with goal literals of the form (?(ti, ..tji)?, where Q € rule{P) and rule{P) is the 
set of literals in the head of rules (since only rules need to be tested) 

- a result R being either a positive "true'\ negative false" or "unknown" label. 

- an intended answer set $ of expected variable bindings for the variables of the test query Q: 
:= {Xi, ..Xn} where each Xi is a set of variable bindings {Xi/ai, ..,Xi/an}- For ground test 
queries & ;= 0. 

Wc write a TC T as follows: T ^ A U {Q ^> R : 6}. li a. TC has no assertions 
wc simply write T = {Q => R : 6}. For instance, a TC Tl = {p{X) => true : 
{X/a,X/b,X/b},q{Y) => false} defines a TC Tl with two test queries p{X)7 and 
g(y)?. The query p{X)? should succeed and return three answers a,b and c for the free 
variable X. The query q{Y) should fail. In case we are only interested in the existential 
success of a tost query we shorten the notation of a TC to T = {Q => R}. 

To formulate runtime consistency criteria w.r.t. conflicts which might arise due to 
knowledge updates, e.g. adding rules, we apply ICs: 

An IC on a LP is defined as a set of conditions that the constrained KB must satisfy, in order 
to be considered as a consistent model of the intended (real-world domain-specific) model. Sat- 
isfaction of an IC is the fulfillment to the conditions imposed by the constraint and violation of 
an IC is the fact of not giving strict fulfillment to the conditions imposed by the constraint, i.e. 
satisfaction resp. violation on a program (LP) P w.r.t the set of IC := {zci,..ici} defined in P 
is the satisfaction of each ici G IC at each KB state P :— P U Mi ^ P U Mi^i with Mq — 0, 
where Mi is an arbitrary knowledge update adding, removing or changing rules or facts to the 
dynamically extended or reduced KB. 

Accordingly, ICs are closely related to our notion of TCs for LPs. In fact, TCs can be 
seen eis more expressive ICs. Prom a syntactical perspective we distinguish ICs from 

TCs, since in our (ContractLog) approach wc typically represent and manage TCs as 
stand-alone LP scripts (module files) which are imported to the KB, whereas ICs are 
defined as LP functions. Both, internal ICs or external TCs can be used to define 
conditions which denote a logic or application specific conflict. ICs in ContractLog are 
defined as a n-ary function integrity{< operator >, < conditions >). We distinguish 
four types of ICs: 

- Not- constraints which express that none of the stated conclusions should be drawn. 

- X or- constraints which express that the stated conclusions should not be drawn at the same time. 

- Or- constraints which express that at least one of the stated conclusions must be drawn. 

- And- constraints which express that all of the stated conclusion must draw. 

ICs are defined as constraints on the set of possible models and therefore describe 
the model(s) which should be considered as strictly conflicting. Model theoretically 
we attribute a 2- valued truth value (true/false) to an IC and use the defined set of 
constraints (literals) in an IC as a goal on the program P, by meta interpretation (as 
procedural semantics) of the integrity functions. In short, the truth of an IC in a finite 
interpretation I is determined by running the goal Gic defined by the IC on the clauses 
in P or more precisely on the actual knowledge state of P,. If the Gic is satisfied, i.e. 
there exists at least one model for the sentence formed by the Gic- Pi \= Gic, the 
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IC is violated and P is proven to be in an inconsistent state w.r.t. IC: IC is violated 
resp. Pi violates integrity iff for any interpretation I, I \= Pi ^ I \= Gic- We define 
the following interpretation for ICs: 

And and(Ci, .., C„): Pi |= {notC\ V .. V notC„) if exists i £ 1, ...n, Pi \= not d 

Not: not{Ci, ..,C„): Pi |= (Ci V . . V C„ ) if exists i e 1, .., ii, Pi |= Ci 

Or: or{Ci, .., C„): Pi |= {notCi A .. A notC„ if for all i e 1, ..,n, Pi h "ot d 

Xor: a:or(Ci, .., Cti): Pi |— (Cj A Cfc) if exists j £ 1, ..,n, Pi |— Cj and exists £ 1, .., n, Pi |— Cfc 
with Cj # Cfc and Cj 6 C, Ct 6 C 

C := {Ci,..C„} are positive or negative (explicit negated) n-ary atoms which might 
contain variables; not is used in the usual sense of default negation, i.e. if a constraint 
literal can not be proven true, it is assumed to be false. If there exists a model for a 
IC goal (as defined above), i.e. the "integrity test goal" is satisfied Pi \= Gic, the IC 
is assigned true and hence integrity is violated in the actual knowledge/program state 
Pi. 

4 Declarative Test Coverage Measurement 

Test coverage is an essential part of the feedback loop in the test-driven engineering 
process. The coverage feedback highlights aspects of the formalized policy/contract 
specification which may not be adequately tested and which require additional testing. 
This loop will continue until coverage of the intended models of the formalized policy 
specification meets an adequate approximation level by the TC resp. test suites (TS) 
which bundle several TCs. Moreover, test coverage measurements helps to avoid atro- 
phy of TSs when the rule-based specifications are evolutionary extended. Measuring 
coverage helps to keep the tests up to a required level if new rules are added or exist- 
ing rules are removed/changed. However, conventional testing methods for imperative 
programming languages rely on the control fiow graph as an abstract model of the pro- 
gram or the explicitly defined data flow and use coverage measures such as branch or 
path coverage. In contrast, the proof-theoretic semantics of LPs is based on resolution 
with uniflcation and backtracking, where no explicit control flow exists and goals are 
used in a refutation attempt to specialize the rules in the declarative LP by unifying 
them with the rule heads. Accordingly, building upon this central concept of unifica- 
tion a test covers a logic program P, if the test queries (goals) lead to a least general 
specialization of each rule in P, such that the full scope of terms (arguments) of each 
literal in each rule is investigated by the set of test queries. 

Inductively deriving general information from specific knowledge is a task which is 
approached by inductive logic programming (ILP) techniques which allow computing 
the least general generalization (Igg), i.e. the most specific clause (e.g. w.r.t. theta 
subsumption) covering two input clauses. A Igg is the generalization that keeps an 
generalized term t (or clause) as special as possible so that every other generalization 
would increase the number of possible instances of t in comparison to the possible 
instances of the Igg. Efficient algorithms based on syntactical anti-unification with 9- 
subsumption ordering for the computation of the (relative) Igg(s) exist and several 
implementations have been proposed in ILP systems such as GOLEM, or FOIL. 6- 
subsumption introduces a syntactic notion of generality: A rule (clause) r (resp. a term 
t) 6-subsumes another rule r', if there exists a substitution 6, such that r C r', i.e. a 
rule r is as least as general as the rule r' (r < r'), if r 0-subsumes r' resp. is more 
general than r' (r < r') if r < r' and r' ^ r. (see e.g. |16| 'l In order to determine the 
level of coverage the specializations of the rules in the LP under test are computed via 
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specializing the rules with the test queries by standard unification. Then via general- 
izing these specializations under S-subsumption ordering, i.e. computing the Iggs of all 
successful specializations, a reconstruction of the original LP is attempted. The number 
of successful "recoverings" then give the level of test coverage, i.e. the level determines 
those statements (rules) in a LP that have been executed/investigated through a test 
run and those which have not. In particular, if the complete LP can be reconstructed 
via generalization of the specialization then the test fully covers the LP. Formally we 
express this as follows: 

Let T be a test with a set of test queries T := {Qi?, .., Qn?} for a program P, then T 
is a cover for a rule G P, if the lgg(r'i) ~ under 6 — subsumption, where ~ is an 
equivalence relation denoting variants of clauses/terms and the are the specializa- 
tions of Ti by a query Qi € T. It is a cover for a program P, if T is a cover for each 
rule Vi € P. With this definition it can be determined whether a test covers a LP or 
not. The coverage measure for a LP P is then given by the number of covered rules r, 
divided by the number k of all rules in P: 

cover p{T) : - ^ 

For instance, consider the following simplified business policy P: 

discount (Customer, 10°/) :- gold(Customer) . 

gold(Customer) :- spendingCCustomer, Value) , Value > 3000. 
spendingC 'Moor' ,5000) . spending ( 'Do' ,4000) . '/.facts 

Let T = {^discount{' Moor' ^1Q%)1 => true, discount{' Do' , 10%)? => true be a test with two test 
queries. The set of directly derived specializations by applying this tests on P are: 

discount ( 'Moor ', lOX) gold('Moor') . 
discount (' Do ' ,10y.) :- gold('Do'). 

The computed Iggs of this specializations are: 

discount (Customer , 10%) :- gold (Customer) . 

Accordingly, the coverage of P is 50%. We extend T with the additional test goals: {gold{' Moor')? — > 
true, gold{' Do')? => true)?}. This leads to two new specializations: 

gold('Moor') :- spending ( 'Moor' .Value) , Value > 3000. gold('Do') 
spendingC 'Do' .Value) . Value > 3000. 

The additional Iggs are then: 

gold(Customer) :- spending(Customer. Value) . Value > 3000. 

T now covers P, i.e. coverage = 100%. 

The coverage measure determines how much of the information represented by the 
rules is already investigated by the actual tests. The actual Iggs give feedback how 
to extend the set of test goals in order to increase the coverage level. Moreover, re- 
peatedly measuring the test coverage each time when the rule base becomes updated 
(e.g. when new rules are added) keeps the test suites (set of TCs) up to acceptable 
testing standards and one can be confident that there will be only minimal problems 
during runtime of the LP because the rules do not only pass their tests but they are 
also well tested. In contrast to other computations of the Iggs such as implication (i.e. 
a stronger ordering relationship), which becomes undecidable if functions are used, 9- 
subsumption has nice computational properties and it works for simple terms as well 
as for complex terms with or without negation, e.g. p() : — g(/(a)) is a specialization 
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of p : -~q{X). Although it must be noted that the resulting clause under generalization 
with 6'-subsumption ordering may turn out to be redundant, i.e. it is possible find an 
equivalent one which is described more shortly, this redundancy can be reduced and 
since we are only generalizing the specializations on the top level this reduction is com- 
putationally adequate. Thus, S-subsumption and least general generalization qualify to 
be the right framework of generality in the application of our test coverage notion. 

Although, the defined coverage measure is based on the central concept of unifi- 
cation and uses ILP techniques for generalization of the derived specializations of the 
rule base, it is worth noting, that the measure might be applied also in the context 
of forward-directed reactive rules such as EGA rules or production rules. There are 
several approaches in the active database domain which transform active rules into LP 
derivation rules, in order to exploit the formal declarative semantics of logic programs 
to overcome confiuence and termination problems of active rule execution sequences, 
where the actions are input events of further active rules. i28i29t30ii For such trans- 
formed declarative rule bases consisting of LP derivation rules test cases can be written 
and the coverage can be computed as described above. The combination of deductive 
and active rules has been also investigated in different approaches mainly based on the 
simulation of active rules by means of deductive rules. |31I32I33| Moreover, there are 
approaches which directly build reactive rules on top of LP derivation rules such as 
the Event Condition Action Logic Programming language (ECA-LP) which enables a 
homogeneous representation of EGA rules and derivation rules. |37I38| Closely related 
are also logical update languages such as transaction logics and in particular serial 
Horn programs, where the serial Horn rule body is a sequential execution of actions in 
combination with standard Horn pre-/post conditions. |34l These serial rules can be 
processed top-down or bottom-up and hence are closely related to the production rules 
style of condition update action. This partial relation between backward reasoning 
LP derivation rules and forward reasoning production rules which enables transforma- 
tions of production rule bases into logic programs has been also shown for a subclass of 
production rules, the stratified production rules. Hence, these class of production rules 
also qualifies for our goal-driven testing approach and unification based test coverage 
measure. I3til35| 

5 Test-driven V&V of Rule Engines and Rule Interchange 

Typical rule-based B2B contracts or service-oriented policies are managed and main- 
tained in a distributed environment where the rules and data is interchanged over 
domain and system boundaries using more or less standardized rule markup inter- 
change formats, e.g. RuleML, SWRL, RBSLA, RTF. The interchanged rules/LPs need 
to be interpreted and correctly executed in the target environment, i.e. in a target 
rule/inference engine, which might be provided as an open (Web) service by a third- 
party provider or a standardization body such as OMG or W3C (see |17|1. Obviously, 
the correct execution of the interchanged LP depends on the semantics of both, the LP 
and the the inference engine (IE). TCs, which are interchanged together with the LP, 
can be used to test whether the LP still behaves as intended in the target environment. 

To address this issues the IE, the interchanged LP and the provided TCs must reveal 
their (intended resp. implemented) semantics. This might be solved with explicit meta 
annotations based on a common vocabulary, e.g. an (Semantic Web) ontology which 
classifies semantics such as COMP (completion semantics), STABLE (stable model), 
WFS (well-founded) and relates them to classes of LPs such as positive definite LPs, 
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stratified LPs, normal LPs, extended LPs, disjunctive LPs. The ontology can then be 
used to describe additional meta information about the semantics and logic class of 
the interchanged rules and TCs and find appropriate lEs to correctly and efficiently 
interpret and execute the LP, e.g. (1) via configuring the rule engine for a particular 
semantics in case it supports different ones (see e.g. the configurable ContractLog IE), 
(2) by executing an applicable variant of several interchanged semantics alternatives of 
the LP or (3) by automatic transformation approaches which transform the interchange 
LP into an executable LP. However, we do not believe that each rule engine vendor 
will annotate its implementation with such meta information, even when there is an 
official standard Semantic Web ontology on hand (e.g. released by OMG or W3C). 
Therefore, means to automatically determine the supported semantics of lEs or LPs 
are needed. As we will show, TCs can be extended to meta test programs testing typical 
properties of well-known semantics and by the combination of succeed and failed meta 
tests uniquely determine the unknown semantics of the target environment. 

A great variety of semantics for LPs (LP-semantics) and non-monotonic reasoning 
(NMR-semantics) have been developed in the past decades. For an overview we relate 
to 1131 . In general, there are three ways to determine the semantics (and hence the IE) 
to be used for execution: (1) by its complexity and expressiveness class (which are in a 
trade-off relation to each other), (2) by its runtime performance or (3) by the semantic 
properties it should satisfy. A generally accepted criteria as to why one semantics should 
be used over another does not exists, but two main competing approaches, namely WFS 
and STABLE, have been broadly accepted as declarative semantics for normal LPs. 

For discussion of the worst case complexity and expressiveness of several classes of 
LPs we refer to ^18. . Based on these worst-case complexity results for different semantics 
and expressive classes of LPs, which might be published in a machine interpretable for- 
mat (Semantic Web ontology) for automatic decision making, certain semantics might 
be already excluded to be usable for a particular rule-based policy/contract application. 
However, asymptotic worst-case results are not always appropriate to quantify perfor- 
mance and scalability of a particular rule execution environment since implementation 
specifics of an IE such as the use of inefficient recursions or memory-structures might 
lead to low performance or memory overflows in practice. TCs can be used to measure 
the runtime performance and scalability for different outcomes of a rule set given a cer- 
tain test fact base as input. By this certain points of attention, e.g., long computations, 
loops or deeply nested derivation trees, can be identified and a refactoring of the rule 
code (e.g. reordering rules, narrowing rules, deleting rules etc.) can be attempted |19|. 
We call this dynamic testing in opposite to functional testing. Dynamic TCs with max- 
imum time values (time constraints) are defined as an extension to functional TCs (see 
section 3): TC = A{J{Q —> R : 9 < MS}, where MS is a maximum time constraint for 
the test query Q. If the query was not successful within this time frame the test is said 
to be failed. For instance, consider the dynamic TC TCdyn : g(a)? true < 1000ms. 
The test succeeds iff the test query succeeds and the answer is computed in less than 
1000 milliseconds. 

To define a meta ontology of semantics and LP classes (represented as a OWL on- 
tology - see )2U| for more details) which can be used to meta annotate the interchanged 
policy LPs, the lEs and the TCs we draw on the general semantics classification theory 
developed by J. Dix I14I15| . Typical top-level LP classes are, e.g., definite LPs, strat- 
ified LPs, normal LP, extended LPs, disjunctive LPs. Well-known semantics for these 
classes are e.g., least and supported Herbrand models, 2 and 3- valued COMP, WFS, 
STABLE, generalized WFS etc. Given the information to which class a particular LP 
belongs or which is the intended semantics of this LP and given the information which 
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semantics is implemented by the (target) IE, it is straightforward to decide wether the 
LP can be executed by the IE or not at aU. In short, a LP can not be executed by 
an IE, if the IE derives less literals than the intended SEM for which the LP was 
design for would do, i.e. SEM' (IE) > SEM(P) or the semantics implemented by the 
IE is not adequate for the program, i.e. SEM' [IE) 7^ SEM{P) . This information can 
be give by meta annotations, e.g., class: defines the class of the LP / IE; semantics: 
defines the semantics of the LP / IE; syntax: defines the rule language syntax. 

In the context of rule interchange with open, distributed lEs, which might be pro- 
vided as public services, an important question is, wether the IE correctly implements 
a semantics. Meta TCs can be used for V&V of the interchanged LP in the target en- 
vironment and therefore establish trust to this service. Moreover, meta TCs checking 
general properties of semantics can be also used to verify and determine the semantics 
of the target IE in case it is unknown (not given by meta annotations). Kraus et al. |21| 
and Dix |14I15I proposed several weak and structural (strong) properties for arbitrary 
(non-monotonic) semantics, e.g.: 

Strong Properties 

- Cumulativity: If (7 C y C SEMp'"'*(U), then SEMp'"'*(U) = SEMp""* (V), where U and V 
are are sets of atoms and SEM^'^^^ is an arbitrary sceptical semantics for tlie program P, i.e. if 
a h then a |~ c iff (a A b) |~ c. 

- Rationality: If U <Z V.,V n {A : SEMp""(U) |= -.A} = 0, then SEMp'""'*(U) C SEMp""^* (V). 
"Weak Properties 

- Elimination of Tautologies: If a rule a < — b A not c with a n 6 — is eliminated from a program 
P, then the resulting program P' is scmantically equivalent: SEM{P) — SEM(P'). a,b,c arc sets 
of atoms: P m P' iff there is a rule H ^ B e P such that H e B and P' = P \ {H ^ b} 

- Generalized Principle of Partial Evaluation (GPPE): If a rule a < — 6 A not c, where b contains 
an atom B, is replaced in a program P' by the n rules a U (a* — B) < — ((6 — B) U b^) A not (c U c^), 
where a' ^ h' A not c\i = 1, ..n) are all rules for which B G a\ then SEM(P) = SEM(P') 

~ Positive/ Negative Reduction: If a rule a < — fc A not c is replaced in a program P by a ^ — 6 A not 
(c — C) (C is an atom), where C appears in no rule head, or a rule a < — A not c is deleted from P, 
if there is a fact a' in P such that a' C c, then SEM{P) = SEM(P'): 

1. Positive Reduction: P ^— > P' iff there is a rule H < — B ^ P and a negative literal not B ^ B such 
that B 9 HEAD(P) and P' = {P \ {H ^ B}) U {H ^ (B \ {notB})} 

2. Negative Reduction: P P' iff there is a rule H ^ — B ^ P and a negative literal not B ^ B such 
that B e FACT{P) and P' = (P \ {ff ^ B}) 

- Elimination of Non-Minimal Rules / Subsurnption: If a rule a < — 6 A not c is deleted from a 
program P if there is another rule a' < — fc' A not c such that a ^ a. b' C 6, c' C c, where at least 
one C is proper, then SEM{P) = SEM{P'): P P' iff there are rules H ^ B and H ^ B' e P 
such that B C B' and P' = P \ {H ^ B'} 

- Consistency: SEM{P) — for all disjunctive LPs 

- Independence: For every literal L, L is true in every M G SEM(P) iff L is true in every 
M G SEM{P U P') provided that the language of P and P' arc disjoint and L belongs to the 
language of P 

- Relevance: The truth value of a literal L with respect to a semantics SEM{P), only depends on the 
subprogram formed from the relevant rules of P {relevant{P)) with respect to L: SEM{P){L) — 
SEM(relevant(P, L))(L) 

The basic idea to apply these properties for the V&V as well as for the automated 
determination of the semantics of arbitrary LP rule inference environments is, to trans- 
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late known counter examples into meta TCs and apply them in the target IE. Such 
counter examples which show that certain semantics do not satisfy one or more of the 
general properties, can be found in lit eratur e. To demonstrate this approach we will 
now give some examples derived from |14I15| . For more detailed discussion of this ap- 
proach and more examples see (2U| : 



Example: STABLE is not Cautious 

P: a <- neg b P' : a <- neg b 

b <- neg a b <- neg a 

c <- neg c c <- neg c 



T : {a?=>true , c?=>true} 



STABLE(P) has {a, neg b, c> as its only stable model and hence it 
derives 'a' and 'c', i.e. 'T' succeeds. By adding the derived atom 
'c' we get another model for P' {neg a, b, c>, i.e. 'a' can no 
longer derived (i.e. 'T' now fails) and cautious monotonicity is 
not satisfied. 

Example: STABLE does not satisfy Relevance 

P: a <- neg b P' : a <- neg b 

c <- neg c 

T:=-(a?=>true} 



The unique stable model of 'P' is {a}. If the rule *c <- neg c' is 
added, 'a' is no longer derivable because no stable model exists. 
Relevance is violated, because the truth value of 'a' depends on 
atoms that are totaly unrelated with 'a'. 

The initial "positive" meta TC is used to verify if the (unknown) semantics imple- 
mented by the IE will provide the correct answers for this particular meta test pro- 
gram. The "negative" TC is then used to evaluate if the semantics of the IE satisfies 
the property under tests. Such meta test sets provide us with a tool for determining 
an "adequate" semantics to be used for a particular rule-based policy /contract appli- 
cation. Moreover, there are strong evidences that by taking both kinds of properties 
together an arbitrary semantics might be uniquely determined by these, i.e. via apply- 
ing a meta test suite consisting of adequate meta TCs with typical counter examples 
for these properties in a IE, we can uniquely determine the semantics of this IE. Table 
1 (derived from |14I15| ') specifies for common semantics the properties that they satisfy. 
The semantic principles described in this section are also very important in the context 



Table 1. Table (General Properties of Semantics) 
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of applying refactorings to LPs. In general, a refactoring to a rule base should optimize 
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the rule code without changing the semantics of the program. Removing tautologies 
or non- minimal rules or applying positive/negative reductions are typically applied in 
rule base refinements using refactorings |19| and the semantics equivalence relation 
between the original and the refined program defined for this principles is therefore an 
important prerequisite to safely apply a refactoring of this kind. 

6 Integration into Testing Frameworks and RuleML 

We have implemented the test drive approach in the ContractLog KR 20 . The Con- 
tractLog KR |T] is an expressive and efficient KR framework developed in the RBSLA 
project 12] and hosted at Sourceforge for the representation of contractual rules, poli- 
cies and service level agreements implementing several logical formalisms such as event 
logics, defeasible logic, deontic logics, description logic programs in a homogeneous LP 
framework as meta programs. TCs in the ContractLog KR are homogeneously inte- 
grated into LPs and are written in an extended ISO Prolog related scripting syntax 
called Prova A TC script consists of (1) a unique test case ID denoted by the 
function testcase(ID), (2) optional input assertions such as input facts and test rules 
which are added temporarily to the KB as partial modules by expressive ID-based 
update functions, (3) a positive meta test rule defining the test queries and variable 
bindings testSuccess(Test Name, Optional Message for Junit), (4) a negative test rule 
testFailure(Test Name, Message) and (5) a runTest rule. 

*/„ testcase oid 

test case (" . / examples/tcl . test " ) . 

*/„ assertions via ID-based updates adding one rule and two facts 

:-solve(update("tcl.test","a(X) :-b(X) . b(l). b(2).")). 

*/o positive test with success message for JUnit report 

testSuccess ("test 1" , "succeeded") : - 

testcase ( . /examples/tcl .test) , testQuery CaCl) ) . 

*/d negative test with failure message for Junit report 

testFailureC"testl" , "can not derive a"):- 

not (test Success C "test 1" , Message) ) . 

*/„ define the active tests - used by meta program 

runTest C" . /examples/tcl .test") : -testSuccess C "test 1" , Message) . 

A TC can be temporarily loaded and removed to/from the KB for testing purposes, 
using expressive ID-based update functions for dynamic LPs 20 . The TC meta pro- 
gram implements various functions, e.g., to define positive and negative test queries 
{testQuery, testNotQuery, testN eg Query), expected answer sets (variable bindings: 
testResults) and quantifications on the expected number of result {testNumberOfRe- 
sults). It also implements the functions to compute the clause/term specializations 
[specialize) and generalizations (generalize) as well as the test coverage (cover). To 
proof integrity constraints we have implemented another LP meta program in the 
ContractLog KR with the main test axioms: 

- testlntegrityO tests the integrity of the aetual program, i.e. it proves all integrity constrains in 
the knowledge base using them as goals constraining on the facts and rules in the KB. 

- testIntegrity{Literal) tests the integrity of the literal, i.e. it makes a hypothetical test and proves 
if the literal, which is actually not in the KB, violates any integrity constraint in the KB. 

The first integrity test is useful to verify (test logical integrity) and validate (test ap- 
plication/domain integrity) the integrity of the actual knowledge state. The second 
integrity test is useful to hypothetically test an intended knowledge update, e.g. test 
wether a conclusion from a rule (the literal denotes the rule head) will lead to violations 
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of the integrity of the program. Similar sets of test axioms are provided in ContractLog 
for temporarily loading, executing and unloading TCs from external scripts at runtime. 

In order to become widely accepted and useable to a broad community of policy 
engineers and practitioners existing expertise and tools in traditional SE and flexible 
information system (IS) development should be adapted to the declarative test-driven 
programming approach. Well-known test frameworks like JUnit facilitate a tight inte- 
gration of tests into code and allow for automated testing and reporting in existing IDEs 
such as eclipse via automated Ant tasks. The RBSLA/ ContractLog KR implements 
support for JUnit based testing and test coverage reporting where TCs can be man- 
aged in test suites (represented as LP scripts) and automatically run by a JUnit Ant 
task which creates a final JUnit and test coverage report. The RBSLA/ContractLog 
distribution comes with a set of functional-, regression-, performance- and meta-TCs 
for the V&V of the inference implementations, semantics and meta programs of the 
ContractLog KR w.r.t. general semantics properties and typical adequacy criteria of 
KR formalisms (in particular w.r.t. completeness, soundness, expressiveness and effi- 
ciency / scalability) . 

To support distributed management and rule interchange we have integrated TCs 
into RuleML (RuleML 0.9). The Rule Markup Language (RuleML) is a standardization 
initiative with the goal of creating an open, producer-independent XML/RDF based 
web language for rules. The Rule Based Service Level Agreement markup language 
(RBSLA) [5| which has been developed for serialization of rule based contracts, policies 
and SLAs comprises the test case layer together with several other layers extending 
RuleML with modelling constructs for e.g. defeasible rules, deontic norms, temporal 
event logics, reactive ECA rules. The markup serialization syntax for test suites / test 
cases includes the following constructs given in EBNF notation, i.e. alternatives are 
separated by vertical bars (|); zero to one occurrences are written in square brackets 
([]) and zero to many occurrences in braces ({}).: 

assertions ::= And test ::= Test I Query message ;:= Ind I Var 
TestSuite ::= [oid,] content I And TestCase ;:= [oid,] {test I 
Test,>, [assertions I And] Test ::= [oid,] [message I Ind I Var,] 
test I Query, [answer I Substitutions] Substitutions :;= {Var, Ind 
I Cterm} 

Example : 

<TestCase @semantics=" semantics : STABLE" 
class=" class : PropositionaI"> 

<Test @semantics="semantics :WFS" !91abel="true"> 
<Ind>Test l</Ind><Ind>Test 1 failed</Ind> 
<Query> 
<And> 

<Atom><Rel>p</Rel></Atom> 
<Kaf><Atom><Rel>q</Rel></Atom></Naf> 

</TestCase> 

The example shows a test case with the test: testl : {p —> true, not q —> true}. 



7 Related Work and Conclusion 

V&V of KB systems and in particular rule based systems such as LPs with Prolog 
interpreters have received much attention from the mid '80s to the early '90s, see e.g. 
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0- Several V&V methods have been proposed, such as methods based on operational 
debugging via instrumenting the rule base and exploring the execution trace, tabular 
methods, which pairwise compare the rules of the rule base to detect relationships 
among premises and conclusions, methods based on formal graph theory or Petri Nets 
which translate the rules into graphs or Petri nets, methods based on declarative de- 
bugging which build an abstract model of the LP and navigate through it or methods 
based on algebraic interpretation which transform a KB into an algebraic structure, 
e.g. a boolean algebra which is then used to verify the KB. As discussed in section 1 
most of this approaches are inherently complex and are not suited for the policy resp. 
contract domain. Much research has also been directed at the automated refinement of 
rule bases, e.g. |19I22| . and on the automatic generation of test cases. For an overview 
on rule base debugging tools see e.g. There are only a few attempts addressing 
test coverage measurement for test cases of backward-reasoning rule based programs 
f25,.27.26, . 

Test cases for rule based policies are particular well-suited when policies/contracts 
grow larger and more complex and are maintained, possibly distributed and inter- 
changed, by different people. In this paper we have attempted to bridge the gap be- 
tween the test-driven techniques developed in the Software Engineering community, on 
one hand, and the declarative rule based programming approach for engineering high 
level policies such as SLAs, on the other hand. We have elaborated on an approach 
using logic programming as a common basis and have extended this test-driven ap- 
proach with the notion of test coverage, integrity tests, functional and dynamic test 
and meta test for verify the inference environments and their semantics properties in 
a open distributed environment such as the (Semantic) Web. In addition to the ho- 
mogeneous integration of test cases into LP languages we have introduce a markup 
serialization as an extension to the emerging Semantic Web Rule Markup Language 
RuleML which, e.g. facilitates rule interchange. We have implemented our approach 
in the ContractLog KR I which is based on the Prova open-source rule environment 

and applied the agile test-driven values and practices successfully in the rule based 
SLA (RBSLA) project for the development of complex, distributed SLAs Clearly, 
test cases and test-driven development is not a replacement for good programming 
practices and rule code review. However, the presence of test cases helps to safeguard 
the life cycle of policy /contract rules, e.g. enabling V&V at design/development time 
but also dynamic testing at runtime. In general, the test-driven approach follows the 
well-known 80-20 rule, i.e. increasing the approximation level of the intended semantics 
of a rule set (a.k.a. test coverage) by finding new adequate test cases becomes more 
and more difficult with new tests delivering less and less incrementally. Hence, under 
a cost-benefit perspective one has to make a break-even point and apply a not too 
defensive development strategy to reach practical levels of rule engineering and testing 
in larger rule based policy or contract projects. 
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